Overview

Dataset statistics

Number of variables16
Number of observations283004
Missing cells609
Missing cells (%)< 0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory17.4 MiB
Average record size in memory64.4 B

Variable types

Numeric9
Categorical7

Warnings

SC has a high cardinality: 2649 distinct values High cardinality
Country has a high cardinality: 177 distinct values High cardinality
CountryCode has a high cardinality: 177 distinct values High cardinality
ArtsHumanities is highly skewed (γ1 = 42.58223174) Skewed
TCperYear is highly skewed (γ1 = 89.11640174) Skewed
NumAuthors is highly skewed (γ1 = 21.01065718) Skewed
ArtsHumanities has 282765 (99.9%) zeros Zeros
LifeSciencesBiomedicine has 238163 (84.2%) zeros Zeros
PhysicalSciences has 219268 (77.5%) zeros Zeros
SocialSciences has 274870 (97.1%) zeros Zeros
Technology has 54286 (19.2%) zeros Zeros
TCperYear has 77342 (27.3%) zeros Zeros

Reproduction

Analysis started2021-01-11 16:15:51.477294
Analysis finished2021-01-11 16:16:32.123145
Duration40.65 seconds
Software versionpandas-profiling v2.10.0
Download configurationconfig.yaml

Variables

PY
Real number (ℝ≥0)

Distinct29
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2010.61726
Minimum1990
Maximum2018
Zeros0
Zeros (%)0.0%
Memory size552.9 KiB
2021-01-11T17:16:32.236470image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1990
5-th percentile1999
Q12006
median2012
Q32016
95-th percentile2018
Maximum2018
Range28
Interquartile range (IQR)10

Descriptive statistics

Standard deviation6.263750677
Coefficient of variation (CV)0.003115337166
Kurtosis-0.3732993416
Mean2010.61726
Median Absolute Deviation (MAD)5
Skewness-0.7143466555
Sum569012727
Variance39.23457254
MonotocityNot monotonic
2021-01-11T17:16:32.435747image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=29)
ValueCountFrequency (%)
201833291
 
11.8%
201728162
 
10.0%
201621970
 
7.8%
201519805
 
7.0%
201416800
 
5.9%
201314872
 
5.3%
201213761
 
4.9%
200913662
 
4.8%
200812614
 
4.5%
201112035
 
4.3%
Other values (19)96032
33.9%
ValueCountFrequency (%)
199072
 
< 0.1%
1991390
 
0.1%
1992674
0.2%
1993834
0.3%
1994994
0.4%
ValueCountFrequency (%)
201833291
11.8%
201728162
10.0%
201621970
7.8%
201519805
7.0%
201416800
5.9%

SC
Categorical

HIGH CARDINALITY

Distinct2649
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size653.6 KiB
Computer Science
43160 
Computer Science; Engineering
25349 
Engineering
20686 
Physics
 
5580
Automation & Control Systems; Engineering
 
5568
Other values (2644)
182661 

Length

Max length188
Median length29
Mean length33.81959972
Min length3

Characters and Unicode

Total characters9571082
Distinct characters49
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique592 ?
Unique (%)0.2%

Sample

1st rowComputer Science; Engineering
2nd rowComputer Science; Engineering
3rd rowComputer Science; Engineering
4th rowComputer Science; Engineering
5th rowComputer Science; Engineering
ValueCountFrequency (%)
Computer Science43160
 
15.3%
Computer Science; Engineering25349
 
9.0%
Engineering20686
 
7.3%
Physics5580
 
2.0%
Automation & Control Systems; Engineering5568
 
2.0%
Computer Science; Neurosciences & Neurology4586
 
1.6%
Science & Technology - Other Topics4464
 
1.6%
Computer Science; Engineering; Telecommunications3972
 
1.4%
Automation & Control Systems; Computer Science3919
 
1.4%
Mathematics3864
 
1.4%
Other values (2639)161856
57.2%
2021-01-11T17:16:32.968887image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
science166765
15.2%
149290
 
13.6%
computer120642
 
11.0%
engineering119602
 
10.9%
technology27923
 
2.5%
systems25785
 
2.3%
control25785
 
2.3%
automation25785
 
2.3%
sciences15705
 
1.4%
physics15381
 
1.4%
Other values (201)405750
36.9%

Most occurring characters

ValueCountFrequency (%)
e1129281
 
11.8%
n861036
 
9.0%
815409
 
8.5%
i758086
 
7.9%
c642036
 
6.7%
o610192
 
6.4%
t500855
 
5.2%
r485700
 
5.1%
g403263
 
4.2%
s353199
 
3.7%
Other values (39)3012025
31.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter7417224
77.5%
Uppercase Letter949152
 
9.9%
Space Separator815409
 
8.5%
Other Punctuation377503
 
3.9%
Dash Punctuation11794
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
e1129281
15.2%
n861036
11.6%
i758086
10.2%
c642036
8.7%
o610192
8.2%
t500855
 
6.8%
r485700
 
6.5%
g403263
 
5.4%
s353199
 
4.8%
m328455
 
4.4%
Other values (13)1345121
18.1%
ValueCountFrequency (%)
S219962
23.2%
C167753
17.7%
E158137
16.7%
M69861
 
7.4%
T59433
 
6.3%
A44479
 
4.7%
P39813
 
4.2%
I39020
 
4.1%
R34894
 
3.7%
O29973
 
3.2%
Other values (11)85827
 
9.0%
ValueCountFrequency (%)
;236197
62.6%
&137589
36.4%
,3717
 
1.0%
ValueCountFrequency (%)
815409
100.0%
ValueCountFrequency (%)
-11794
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin8366376
87.4%
Common1204706
 
12.6%

Most frequent character per script

ValueCountFrequency (%)
e1129281
13.5%
n861036
 
10.3%
i758086
 
9.1%
c642036
 
7.7%
o610192
 
7.3%
t500855
 
6.0%
r485700
 
5.8%
g403263
 
4.8%
s353199
 
4.2%
m328455
 
3.9%
Other values (34)2294273
27.4%
ValueCountFrequency (%)
815409
67.7%
;236197
 
19.6%
&137589
 
11.4%
-11794
 
1.0%
,3717
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII9571082
100.0%

Most frequent character per block

ValueCountFrequency (%)
e1129281
 
11.8%
n861036
 
9.0%
815409
 
8.5%
i758086
 
7.9%
c642036
 
6.7%
o610192
 
6.4%
t500855
 
5.2%
r485700
 
5.1%
g403263
 
4.2%
s353199
 
3.7%
Other values (39)3012025
31.5%

ArtsHumanities
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.0005465428387
Minimum0
Maximum1
Zeros282765
Zeros (%)99.9%
Memory size2.2 MiB
2021-01-11T17:16:33.173509image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum1
Range1
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.02078563022
Coefficient of variation (CV)38.0311089
Kurtosis1923.509438
Mean0.0005465428387
Median Absolute Deviation (MAD)0
Skewness42.58223174
Sum154.6738095
Variance0.0004320424236
MonotocityNot monotonic
2021-01-11T17:16:33.338679image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
0282765
99.9%
196
 
< 0.1%
0.591
 
< 0.1%
0.333333333320
 
< 0.1%
0.2513
 
< 0.1%
0.27
 
< 0.1%
0.16666666676
 
< 0.1%
0.14285714296
 
< 0.1%
ValueCountFrequency (%)
0282765
99.9%
0.14285714296
 
< 0.1%
0.16666666676
 
< 0.1%
0.27
 
< 0.1%
0.2513
 
< 0.1%
ValueCountFrequency (%)
196
< 0.1%
0.591
< 0.1%
0.333333333320
 
< 0.1%
0.2513
 
< 0.1%
0.27
 
< 0.1%

LifeSciencesBiomedicine
Real number (ℝ≥0)

ZEROS

Distinct15
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1112536129
Minimum0
Maximum1
Zeros238163
Zeros (%)84.2%
Memory size2.2 MiB
2021-01-11T17:16:33.476634image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum1
Range1
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.2803793536
Coefficient of variation (CV)2.520182008
Kurtosis4.616926168
Mean0.1112536129
Median Absolute Deviation (MAD)0
Skewness2.452339835
Sum31485.21746
Variance0.07861258191
MonotocityNot monotonic
2021-01-11T17:16:33.596305image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%)
0238163
84.2%
120311
 
7.2%
0.512289
 
4.3%
0.33333333337337
 
2.6%
0.66666666672388
 
0.8%
0.251300
 
0.5%
0.6698
 
0.2%
0.75126
 
< 0.1%
0.2111
 
< 0.1%
0.166666666783
 
< 0.1%
Other values (5)198
 
0.1%
ValueCountFrequency (%)
0238163
84.2%
0.11111111112
 
< 0.1%
0.166666666783
 
< 0.1%
0.2111
 
< 0.1%
0.251300
 
0.5%
ValueCountFrequency (%)
120311
7.2%
0.833333333374
 
< 0.1%
0.823
 
< 0.1%
0.75126
 
< 0.1%
0.66666666672388
 
0.8%

PhysicalSciences
Real number (ℝ≥0)

ZEROS

Distinct16
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1502119242
Minimum0
Maximum1
Zeros219268
Zeros (%)77.5%
Memory size2.2 MiB
2021-01-11T17:16:33.727294image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum1
Range1
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.3103166229
Coefficient of variation (CV)2.065858783
Kurtosis2.320862641
Mean0.1502119242
Median Absolute Deviation (MAD)0
Skewness1.934476673
Sum42510.5754
Variance0.09629640644
MonotocityNot monotonic
2021-01-11T17:16:33.852407image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%)
0219268
77.5%
125257
 
8.9%
0.517552
 
6.2%
0.333333333310461
 
3.7%
0.66666666675227
 
1.8%
0.253229
 
1.1%
0.2866
 
0.3%
0.4791
 
0.3%
0.75178
 
0.1%
0.668
 
< 0.1%
Other values (6)107
 
< 0.1%
ValueCountFrequency (%)
0219268
77.5%
0.142857142940
 
< 0.1%
0.166666666736
 
< 0.1%
0.2866
 
0.3%
0.22222222222
 
< 0.1%
ValueCountFrequency (%)
125257
8.9%
0.827
 
< 0.1%
0.75178
 
0.1%
0.66666666675227
 
1.8%
0.668
 
< 0.1%

SocialSciences
Real number (ℝ≥0)

ZEROS

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.02000337703
Minimum0
Maximum1
Zeros274870
Zeros (%)97.1%
Memory size2.2 MiB
2021-01-11T17:16:33.965652image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum1
Range1
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.126590409
Coefficient of variation (CV)6.32845188
Kurtosis47.35204179
Mean0.02000337703
Median Absolute Deviation (MAD)0
Skewness6.834607574
Sum5661.035714
Variance0.01602513165
MonotocityNot monotonic
2021-01-11T17:16:34.085917image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
0274870
97.1%
13748
 
1.3%
0.52258
 
0.8%
0.33333333331045
 
0.4%
0.25616
 
0.2%
0.6666666667333
 
0.1%
0.238
 
< 0.1%
0.7529
 
< 0.1%
0.628
 
< 0.1%
0.422
 
< 0.1%
Other values (4)17
 
< 0.1%
ValueCountFrequency (%)
0274870
97.1%
0.14285714291
 
< 0.1%
0.166666666710
 
< 0.1%
0.238
 
< 0.1%
0.25616
 
0.2%
ValueCountFrequency (%)
13748
1.3%
0.81
 
< 0.1%
0.7529
 
< 0.1%
0.6666666667333
 
0.1%
0.628
 
< 0.1%

Technology
Real number (ℝ≥0)

ZEROS

Distinct18
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.717984543
Minimum0
Maximum1
Zeros54286
Zeros (%)19.2%
Memory size2.2 MiB
2021-01-11T17:16:34.212263image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10.5
median1
Q31
95-th percentile1
Maximum1
Range1
Interquartile range (IQR)0.5

Descriptive statistics

Standard deviation0.3980375418
Coefficient of variation (CV)0.55438177
Kurtosis-0.766263932
Mean0.717984543
Median Absolute Deviation (MAD)0
Skewness-0.958126331
Sum203192.4976
Variance0.1584338847
MonotocityNot monotonic
2021-01-11T17:16:34.341407image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=18)
ValueCountFrequency (%)
1175142
61.9%
054286
 
19.2%
0.527217
 
9.6%
0.666666666712241
 
4.3%
0.33333333337960
 
2.8%
0.753376
 
1.2%
0.6780
 
0.3%
0.25765
 
0.3%
0.2729
 
0.3%
0.8179
 
0.1%
Other values (8)329
 
0.1%
ValueCountFrequency (%)
054286
19.2%
0.14285714291
 
< 0.1%
0.166666666774
 
< 0.1%
0.2729
 
0.3%
0.25765
 
0.3%
ValueCountFrequency (%)
1175142
61.9%
0.857142857140
 
< 0.1%
0.833333333321
 
< 0.1%
0.8179
 
0.1%
0.753376
 
1.2%

ComputerScience
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.2 MiB
0
162362 
1
120642 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters283004
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1
ValueCountFrequency (%)
0162362
57.4%
1120642
42.6%
2021-01-11T17:16:34.579736image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-01-11T17:16:34.665185image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
0162362
57.4%
1120642
42.6%

Most occurring characters

ValueCountFrequency (%)
0162362
57.4%
1120642
42.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number283004
100.0%

Most frequent character per category

ValueCountFrequency (%)
0162362
57.4%
1120642
42.6%

Most occurring scripts

ValueCountFrequency (%)
Common283004
100.0%

Most frequent character per script

ValueCountFrequency (%)
0162362
57.4%
1120642
42.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII283004
100.0%

Most frequent character per block

ValueCountFrequency (%)
0162362
57.4%
1120642
42.6%

Health
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.2 MiB
0
252025 
1
30979 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters283004
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0
ValueCountFrequency (%)
0252025
89.1%
130979
 
10.9%
2021-01-11T17:16:34.862249image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-01-11T17:16:34.942999image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
0252025
89.1%
130979
 
10.9%

Most occurring characters

ValueCountFrequency (%)
0252025
89.1%
130979
 
10.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number283004
100.0%

Most frequent character per category

ValueCountFrequency (%)
0252025
89.1%
130979
 
10.9%

Most occurring scripts

ValueCountFrequency (%)
Common283004
100.0%

Most frequent character per script

ValueCountFrequency (%)
0252025
89.1%
130979
 
10.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII283004
100.0%

Most frequent character per block

ValueCountFrequency (%)
0252025
89.1%
130979
 
10.9%

NR
Real number (ℝ≥0)

Distinct402
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean30.45958361
Minimum1
Maximum1972
Zeros0
Zeros (%)0.0%
Memory size552.9 KiB
2021-01-11T17:16:35.047618image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile7
Q114
median24
Q339
95-th percentile72
Maximum1972
Range1971
Interquartile range (IQR)25

Descriptive statistics

Standard deviation27.88476565
Coefficient of variation (CV)0.9154677231
Kurtosis233.3866771
Mean30.45958361
Median Absolute Deviation (MAD)11
Skewness7.500231363
Sum8620184
Variance777.5601556
MonotocityNot monotonic
2021-01-11T17:16:35.211746image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
128112
 
2.9%
158110
 
2.9%
107872
 
2.8%
167780
 
2.7%
147747
 
2.7%
117705
 
2.7%
137552
 
2.7%
187341
 
2.6%
207321
 
2.6%
177296
 
2.6%
Other values (392)206168
72.8%
ValueCountFrequency (%)
1247
 
0.1%
2657
 
0.2%
31481
 
0.5%
42591
0.9%
54005
1.4%
ValueCountFrequency (%)
19722
< 0.1%
10731
 
< 0.1%
8821
 
< 0.1%
8633
< 0.1%
7701
 
< 0.1%

TCperYear
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct3692
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.789662283
Minimum0
Maximum1587.8
Zeros77342
Zeros (%)27.3%
Memory size2.2 MiB
2021-01-11T17:16:35.377361image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0.5
Q31.692307692
95-th percentile7
Maximum1587.8
Range1587.8
Interquartile range (IQR)1.692307692

Descriptive statistics

Standard deviation7.703522293
Coefficient of variation (CV)4.304455854
Kurtosis14867.77821
Mean1.789662283
Median Absolute Deviation (MAD)0.5
Skewness89.11640174
Sum506481.5848
Variance59.34425572
MonotocityNot monotonic
2021-01-11T17:16:35.536544image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
077342
27.3%
0.510805
 
3.8%
110486
 
3.7%
0.33333333337399
 
2.6%
0.255098
 
1.8%
24904
 
1.7%
0.66666666674689
 
1.7%
1.54021
 
1.4%
0.23696
 
1.3%
0.16666666672932
 
1.0%
Other values (3682)151632
53.6%
ValueCountFrequency (%)
077342
27.3%
0.033333333336
 
< 0.1%
0.0344827586227
 
< 0.1%
0.0357142857131
 
< 0.1%
0.0370370370450
 
< 0.1%
ValueCountFrequency (%)
1587.82
< 0.1%
932.61
< 0.1%
853.51
< 0.1%
7461
< 0.1%
558.61
< 0.1%

NumAuthors
Real number (ℝ≥0)

SKEWED

Distinct134
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.71077794
Minimum0
Maximum3049
Zeros1
Zeros (%)< 0.1%
Memory size552.9 KiB
2021-01-11T17:16:35.709963image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q12
median3
Q34
95-th percentile7
Maximum3049
Range3049
Interquartile range (IQR)2

Descriptive statistics

Standard deviation125.4928861
Coefficient of variation (CV)11.71650526
Kurtosis460.1208039
Mean10.71077794
Median Absolute Deviation (MAD)1
Skewness21.01065718
Sum3031193
Variance15748.46446
MonotocityNot monotonic
2021-01-11T17:16:35.862573image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
374730
26.4%
265484
23.1%
455815
19.7%
531740
11.2%
118915
 
6.7%
616026
 
5.7%
77733
 
2.7%
84015
 
1.4%
92226
 
0.8%
101352
 
0.5%
Other values (124)4968
 
1.8%
ValueCountFrequency (%)
01
 
< 0.1%
118915
 
6.7%
265484
23.1%
374730
26.4%
455815
19.7%
ValueCountFrequency (%)
304939
< 0.1%
304139
< 0.1%
290841
< 0.1%
290042
< 0.1%
288841
< 0.1%

Organisation
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size276.6 KiB
Academia
272295 
Company
 
10709

Length

Max length8
Median length8
Mean length7.962159545
Min length7

Characters and Unicode

Total characters2253323
Distinct characters12
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAcademia
2nd rowCompany
3rd rowAcademia
4th rowAcademia
5th rowAcademia
ValueCountFrequency (%)
Academia272295
96.2%
Company10709
 
3.8%
2021-01-11T17:16:36.125028image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-01-11T17:16:36.210713image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
academia272295
96.2%
company10709
 
3.8%

Most occurring characters

ValueCountFrequency (%)
a555299
24.6%
m283004
12.6%
A272295
12.1%
c272295
12.1%
d272295
12.1%
e272295
12.1%
i272295
12.1%
C10709
 
0.5%
o10709
 
0.5%
p10709
 
0.5%
Other values (2)21418
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1970319
87.4%
Uppercase Letter283004
 
12.6%

Most frequent character per category

ValueCountFrequency (%)
a555299
28.2%
m283004
14.4%
c272295
13.8%
d272295
13.8%
e272295
13.8%
i272295
13.8%
o10709
 
0.5%
p10709
 
0.5%
n10709
 
0.5%
y10709
 
0.5%
ValueCountFrequency (%)
A272295
96.2%
C10709
 
3.8%

Most occurring scripts

ValueCountFrequency (%)
Latin2253323
100.0%

Most frequent character per script

ValueCountFrequency (%)
a555299
24.6%
m283004
12.6%
A272295
12.1%
c272295
12.1%
d272295
12.1%
e272295
12.1%
i272295
12.1%
C10709
 
0.5%
o10709
 
0.5%
p10709
 
0.5%
Other values (2)21418
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2253323
100.0%

Most frequent character per block

ValueCountFrequency (%)
a555299
24.6%
m283004
12.6%
A272295
12.1%
c272295
12.1%
d272295
12.1%
e272295
12.1%
i272295
12.1%
C10709
 
0.5%
o10709
 
0.5%
p10709
 
0.5%
Other values (2)21418
 
1.0%

Region
Categorical

Distinct9
Distinct (%)< 0.1%
Missing605
Missing (%)0.2%
Memory size276.9 KiB
NorthEast Asia
83987 
Western Europe
63736 
North America
47064 
Eastern Europe to Central Asia
21829 
MiddleEast and North Africa
19708 
Other values (4)
46075 

Length

Max length30
Median length14
Mean length17.04277636
Min length10

Characters and Unicode

Total characters4812863
Distinct characters27
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNorth America
2nd rowSouthEast Asia and Pacific
3rd rowWestern Europe
4th rowWestern Europe
5th rowNorth America
ValueCountFrequency (%)
NorthEast Asia83987
29.7%
Western Europe63736
22.5%
North America47064
16.6%
Eastern Europe to Central Asia21829
 
7.7%
MiddleEast and North Africa19708
 
7.0%
SouthEast Asia and Pacific18570
 
6.6%
South Asia15461
 
5.5%
Latin America and Caribbean10189
 
3.6%
Sub Saharan Africa1855
 
0.7%
(Missing)605
 
0.2%
2021-01-11T17:16:36.442475image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-01-11T17:16:36.547643image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
asia139847
19.2%
europe85565
11.7%
northeast83987
11.5%
north66772
9.2%
western63736
8.7%
america57253
7.9%
and48467
 
6.6%
central21829
 
3.0%
to21829
 
3.0%
eastern21829
 
3.0%
Other values (9)117960
16.2%

Most occurring characters

ValueCountFrequency (%)
a487755
 
10.1%
446675
 
9.3%
t446467
 
9.3%
r434578
 
9.0%
s347677
 
7.2%
e343845
 
7.1%
i295889
 
6.1%
o292184
 
6.1%
E229659
 
4.8%
A218663
 
4.5%
Other values (17)1269471
26.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter3585145
74.5%
Uppercase Letter781043
 
16.2%
Space Separator446675
 
9.3%

Most frequent character per category

ValueCountFrequency (%)
a487755
13.6%
t446467
12.5%
r434578
12.1%
s347677
9.7%
e343845
9.6%
i295889
8.3%
o292184
8.1%
h186645
 
5.2%
n178094
 
5.0%
u121451
 
3.4%
Other values (7)450560
12.6%
ValueCountFrequency (%)
E229659
29.4%
A218663
28.0%
N150759
19.3%
W63736
 
8.2%
S37741
 
4.8%
C32018
 
4.1%
M19708
 
2.5%
P18570
 
2.4%
L10189
 
1.3%
ValueCountFrequency (%)
446675
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin4366188
90.7%
Common446675
 
9.3%

Most frequent character per script

ValueCountFrequency (%)
a487755
11.2%
t446467
10.2%
r434578
10.0%
s347677
 
8.0%
e343845
 
7.9%
i295889
 
6.8%
o292184
 
6.7%
E229659
 
5.3%
A218663
 
5.0%
h186645
 
4.3%
Other values (16)1082826
24.8%
ValueCountFrequency (%)
446675
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII4812863
100.0%

Most frequent character per block

ValueCountFrequency (%)
a487755
 
10.1%
446675
 
9.3%
t446467
 
9.3%
r434578
 
9.0%
s347677
 
7.2%
e343845
 
7.1%
i295889
 
6.1%
o292184
 
6.1%
E229659
 
4.8%
A218663
 
4.5%
Other values (17)1269471
26.4%

Country
Categorical

HIGH CARDINALITY

Distinct177
Distinct (%)0.1%
Missing2
Missing (%)< 0.1%
Memory size559.2 KiB
China
56485 
USA
38625 
India
 
13792
United Kingdom
 
13719
Japan
 
11061
Other values (172)
149320 

Length

Max length38
Median length6
Mean length7.204942015
Min length3

Characters and Unicode

Total characters2039013
Distinct characters54
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique18 ?
Unique (%)< 0.1%

Sample

1st rowUSA
2nd rowAustralia
3rd rowBelgium
4th rowFrance
5th rowUSA
ValueCountFrequency (%)
China56485
20.0%
USA38625
 
13.6%
India13792
 
4.9%
United Kingdom13719
 
4.8%
Japan11061
 
3.9%
Iran, Islamic Republic of10407
 
3.7%
Canada8589
 
3.0%
Taiwan8443
 
3.0%
Germany8381
 
3.0%
Italy8318
 
2.9%
Other values (167)105182
37.2%
2021-01-11T17:16:36.920812image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
china56485
 
16.1%
usa38625
 
11.0%
republic19853
 
5.6%
of18235
 
5.2%
united14297
 
4.1%
india13792
 
3.9%
kingdom13719
 
3.9%
japan11061
 
3.1%
islamic10407
 
3.0%
iran10407
 
3.0%
Other values (194)144923
41.2%

Most occurring characters

ValueCountFrequency (%)
a285136
 
14.0%
i198900
 
9.8%
n193990
 
9.5%
e106893
 
5.2%
r81634
 
4.0%
l77720
 
3.8%
d69429
 
3.4%
C69083
 
3.4%
68802
 
3.4%
h64969
 
3.2%
Other values (44)822457
40.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1541367
75.6%
Uppercase Letter410582
 
20.1%
Space Separator68802
 
3.4%
Other Punctuation18261
 
0.9%
Dash Punctuation1
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
a285136
18.5%
i198900
12.9%
n193990
12.6%
e106893
 
6.9%
r81634
 
5.3%
l77720
 
5.0%
d69429
 
4.5%
h64969
 
4.2%
o63758
 
4.1%
u47393
 
3.1%
Other values (16)351545
22.8%
ValueCountFrequency (%)
C69083
16.8%
S59094
14.4%
U53661
13.1%
A51318
12.5%
I46281
11.3%
R24179
 
5.9%
K22107
 
5.4%
T17253
 
4.2%
J11494
 
2.8%
G11474
 
2.8%
Other values (14)44638
10.9%
ValueCountFrequency (%)
,18235
99.9%
'26
 
0.1%
ValueCountFrequency (%)
68802
100.0%
ValueCountFrequency (%)
-1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1951949
95.7%
Common87064
 
4.3%

Most frequent character per script

ValueCountFrequency (%)
a285136
 
14.6%
i198900
 
10.2%
n193990
 
9.9%
e106893
 
5.5%
r81634
 
4.2%
l77720
 
4.0%
d69429
 
3.6%
C69083
 
3.5%
h64969
 
3.3%
o63758
 
3.3%
Other values (40)740437
37.9%
ValueCountFrequency (%)
68802
79.0%
,18235
 
20.9%
'26
 
< 0.1%
-1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII2039013
100.0%

Most frequent character per block

ValueCountFrequency (%)
a285136
 
14.0%
i198900
 
9.8%
n193990
 
9.5%
e106893
 
5.2%
r81634
 
4.0%
l77720
 
3.8%
d69429
 
3.4%
C69083
 
3.4%
68802
 
3.4%
h64969
 
3.2%
Other values (44)822457
40.3%

CountryCode
Categorical

HIGH CARDINALITY

Distinct177
Distinct (%)0.1%
Missing2
Missing (%)< 0.1%
Memory size559.2 KiB
CHN
56485 
USA
38625 
IND
 
13792
GBR
 
13719
JPN
 
11061
Other values (172)
149320 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters849006
Distinct characters26
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique18 ?
Unique (%)< 0.1%

Sample

1st rowUSA
2nd rowAUS
3rd rowBEL
4th rowFRA
5th rowUSA
ValueCountFrequency (%)
CHN56485
20.0%
USA38625
 
13.6%
IND13792
 
4.9%
GBR13719
 
4.8%
JPN11061
 
3.9%
IRN10407
 
3.7%
CAN8589
 
3.0%
TWN8443
 
3.0%
DEU8381
 
3.0%
ITA8318
 
2.9%
Other values (167)105182
37.2%
2021-01-11T17:16:37.244754image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
chn56485
20.0%
usa38625
 
13.6%
ind13792
 
4.9%
gbr13719
 
4.8%
jpn11061
 
3.9%
irn10407
 
3.7%
can8589
 
3.0%
twn8443
 
3.0%
deu8381
 
3.0%
ita8318
 
2.9%
Other values (167)105182
37.2%

Most occurring characters

ValueCountFrequency (%)
N120288
14.2%
A83882
9.9%
C73916
 
8.7%
S70396
 
8.3%
U70343
 
8.3%
R67542
 
8.0%
H62773
 
7.4%
I37542
 
4.4%
P30384
 
3.6%
T29923
 
3.5%
Other values (16)202017
23.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter849006
100.0%

Most frequent character per category

ValueCountFrequency (%)
N120288
14.2%
A83882
9.9%
C73916
 
8.7%
S70396
 
8.3%
U70343
 
8.3%
R67542
 
8.0%
H62773
 
7.4%
I37542
 
4.4%
P30384
 
3.6%
T29923
 
3.5%
Other values (16)202017
23.8%

Most occurring scripts

ValueCountFrequency (%)
Latin849006
100.0%

Most frequent character per script

ValueCountFrequency (%)
N120288
14.2%
A83882
9.9%
C73916
 
8.7%
S70396
 
8.3%
U70343
 
8.3%
R67542
 
8.0%
H62773
 
7.4%
I37542
 
4.4%
P30384
 
3.6%
T29923
 
3.5%
Other values (16)202017
23.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII849006
100.0%

Most frequent character per block

ValueCountFrequency (%)
N120288
14.2%
A83882
9.9%
C73916
 
8.7%
S70396
 
8.3%
U70343
 
8.3%
R67542
 
8.0%
H62773
 
7.4%
I37542
 
4.4%
P30384
 
3.6%
T29923
 
3.5%
Other values (16)202017
23.8%

Interactions

2021-01-11T17:16:15.736705image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:15.931492image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:16.100419image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:16.272290image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:16.445130image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:16.615145image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:16.791503image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:16.962743image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:17.142695image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:17.310122image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:17.489907image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:17.662179image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:17.826903image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:18.005883image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:18.194306image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:18.381462image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:18.553985image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:19.033848image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:19.202920image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:19.375010image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:19.555513image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:19.733510image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:19.918869image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:20.107453image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:20.284627image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:20.452641image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:20.628924image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:20.802276image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:20.976468image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:21.155843image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:21.335959image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:21.514830image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:21.703515image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:21.877698image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:22.050192image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:22.215361image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:22.382447image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:22.548979image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:22.725760image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:22.900208image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:23.076121image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:23.251339image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:23.412136image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:23.576807image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:23.751999image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:23.920324image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:24.096417image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:24.276219image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:24.445069image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:24.629293image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:24.804482image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:24.983625image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:25.173278image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:25.353751image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:25.533882image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:25.728545image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:25.912910image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:26.099791image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:26.279826image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:26.460481image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:26.648535image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:26.825605image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:27.012269image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:27.206898image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:27.396779image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:27.577683image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:27.753006image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:27.933046image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:28.108547image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:28.284029image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:28.464679image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-11T17:16:28.651867image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2021-01-11T17:16:37.362375image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-01-11T17:16:37.578037image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-01-11T17:16:37.805153image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-01-11T17:16:38.025079image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-01-11T17:16:38.231264image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-01-11T17:16:29.139001image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-01-11T17:16:30.169592image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-01-11T17:16:31.476473image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-01-11T17:16:31.807445image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

PYSCArtsHumanitiesLifeSciencesBiomedicinePhysicalSciencesSocialSciencesTechnologyComputerScienceHealthNRTCperYearNumAuthorsOrganisationRegionCountryCountryCode
01998Computer Science; Engineering0.00.00.00.01.010130.5454553AcademiaNorth AmericaUSAUSA
11998Computer Science; Engineering0.00.00.00.01.010240.3636362CompanySouthEast Asia and PacificAustraliaAUS
21998Computer Science; Engineering0.00.00.00.01.010133.4545454AcademiaWestern EuropeBelgiumBEL
31998Computer Science; Engineering0.00.00.00.01.010133.4545454AcademiaWestern EuropeFranceFRA
41998Computer Science; Engineering0.00.00.00.01.010120.1363642AcademiaNorth AmericaUSAUSA
51998Computer Science; Engineering0.00.00.00.01.010191.7727272CompanySouth AsiaIndiaIND
61998Computer Science; Engineering0.00.00.00.01.010341.1818182AcademiaNorth AmericaUSAUSA
71998Computer Science; Engineering0.00.00.00.01.010230.1818183AcademiaWestern EuropeUnited KingdomGBR
81998Computer Science; Engineering0.00.00.00.01.010143.1363643CompanyWestern EuropeGermanyDEU
91998Computer Science; Engineering0.00.00.00.01.010170.4090913AcademiaNorth AmericaUSAUSA

Last rows

PYSCArtsHumanitiesLifeSciencesBiomedicinePhysicalSciencesSocialSciencesTechnologyComputerScienceHealthNRTCperYearNumAuthorsOrganisationRegionCountryCountryCode
2829942017General & Internal Medicine0.01.00.00.00.001210.6666672AcademiaWestern EuropeUnited KingdomGBR
2829952018Business & Economics; Biomedical Social Sciences0.00.00.01.00.00030.0000001AcademiaNorthEast AsiaChinaCHN
2829962016Cultural Studies0.00.00.01.00.000320.0000001AcademiaSouthEast Asia and PacificAustraliaAUS
2829972018History1.00.00.00.00.000660.0000001AcademiaWestern EuropeNetherlandsNLD
2829982015Business & Economics0.00.00.01.00.000530.2000001AcademiaWestern EuropeUnited KingdomGBR
2829992015Business & Economics0.00.00.01.00.000330.2000002AcademiaNorth AmericaCanadaCAN
2830002009Business & Economics0.00.00.01.00.000600.0000002AcademiaNorth AmericaUSAUSA
2830012001Literature1.00.00.00.00.000222.3684212AcademiaNorth AmericaUSAUSA
2830022018Literature1.00.00.00.00.000340.0000001AcademiaWestern EuropeSwitzerlandCHE
2830032017Literature1.00.00.00.00.000450.0000001AcademiaWestern EuropeUnited KingdomGBR